59 research outputs found

    Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression

    Get PDF
    Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition

    Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data

    Get PDF
    This article compares and discusses three different statistical methods for investigating risk factors for overweight and obesity in early childhood by means of the LISA study, a recent German birth cohort study with 3097 children. Since the definition of overweight and obesity is typically based on upper quantiles (90% and 97%) of the age specific body mass index (BMI) distribution, our aim was to model the influence of risk factors and age on these quantiles while as far as possible taking the longitudinal data structure into account. The following statistical regression models were chosen: additive mixed models, generalized additive models for location, scale and shape (GAMLSS), and distribution free quantile regression models. The methods were compared empirically by cross-validation and for the data at hand no model could be rated superior. Motivated by previous studies we explored whether there is an age-specific skewness of the BMI distribution. The investigated data does not suggest such an effect, even after adjusting for risk factors. Concerning risk factors, our results mainly confirm results obtained in previous studies. From a methodological point of view, we conclude that GAMLSS and distribution free quantile regression are promising approaches for longitudinal quantile regression, requiring, however, further extensions to fully account for longitudinal data structures

    GAMLSS for high-dimensional data – a flexible approach based on boosting

    Get PDF
    Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high-dimensional GAMLSS that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The proposed algorithm was applied to data of the Munich Rental Guide, which is used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net-rent predictions that resulted from the high-dimensional GAMLSS were found to be highly competitive while covariate-specific prediction intervals showed a major improvement over classical GAMs

    Structured additive quantile regression with applications to modelling undernutrition and obesity of children

    Get PDF
    Quantile regression allows to model the complete conditional distribution of a response variable - expressed by its quantiles - depending on covariates, and thereby extends classical regression models which mainly address the conditional mean of a response variable. The present thesis introduces the generic model class of structured additive quantile regression. This model class combines quantile regression with a structured additive predictor and thereby enables a variety of covariate effects to be flexibly modelled. Among other components, the structured additive predictor comprises smooth non-linear effects of continuous covariates and individual-specific effects which are particularly important in longitudinal data settings. Furthermore, this thesis gives an extensive overview of existing approaches for parameter estimation in structured additive quantile regression models. These approaches are structured into distribution-free and distribution-based approaches as well as related model classes. Each approach is systematically discussed with regard to the four previously defined criteria, (i) which different components of the generic predictor can be estimated, (ii) which properties can be attributed to the estimators, (iii) if variable selection is possible, and, finally, (iv) if software is available for practical applications. The main methodological development of this thesis is a boosting algorithm which is presented as an alternative estimation approach for structured additive quantile regression. The discussion of this innovative approach with respect to the four criteria points out that quantile boosting involves great advantages regarding almost all criteria - in particular regarding variable selection. In addition, the results of several simulation studies provide a practical comparison of boosting with alternative estimation approaches. From the beginning of this thesis, the development of structured additive quantile regression is motivated by two relevant applications from the field of epidemiology: the investigation of risk factors for child undernutrition in India (by a cross-sectional study) and for child overweight and obesity in Germany (by a birth cohort study). In both applications, extreme quantiles of the response variables are modelled by structured additive quantile regression and estimated by quantile boosting. The results are described and discussed in detail

    Boosted Beta regression.

    Get PDF
    Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures

    Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using quantile boosting

    Get PDF
    BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role

    Differences in BMI z-Scores between Offspring of Smoking and Nonsmoking Mothers: A Longitudinal Study of German Children from Birth through 14 Years of Age

    Get PDF
    BACKGROUND: Children of mothers who smoked during pregnancy have a lower birth weight but have a higher chance to become overweight during childhood. OBJECTIVES: We followed children longitudinally to assess the age when higher body mass index (BMI) z-scores became evident in the children of mothers who smoked during pregnancy, and to evaluate the trajectory of changes until adolescence. METHODS: We pooled data from two German cohort studies that included repeated anthropometric measurements until 14 years of age and information on smoking during pregnancy and other risk factors for overweight. We used longitudinal quantile regression to estimate age-and sex-specific associations between maternal smoking and the 10th, 25th, 50th, 75th, and 90th quantiles of the BMI z-score distribution in study participants from birth through 14 years of age, adjusted for potential confounders. We used additive mixed models to estimate associations with mean BMI z-scores. Results: Mean and median (50th quantile) BMI z-scores at birth were smaller in the children of mothers who smoked during pregnancy compared with children of nonsmoking mothers, but BMI z-scores were significantly associated with maternal smoking beginning at the age of 4-5 years, and differences increased over time. For example, the difference in the median BMI z-score between the daughters of smokers versus nonsmokers was 0.12 (95% CI: 0.01, 0.21) at 5 years, and 0.30 (95% CI: 0.08, 0.39) at 14 years of age. For lower BMI z-score quantiles, the association with smoking was more pronounced in girls, whereas in boys the association was more pronounced for higher BMI z-score quantiles. CONCLUSIONS: A clear difference in BMI z-score (mean and median) between children of smoking and nonsmoking mothers emerged at 4-5 years of age. The shape and size of age-specific effect estimates for maternal smoking during pregnancy varied by age and sex across the BMI z-score distribution
    corecore